Listen, Attend and Spell

نویسندگان

  • William Chan
  • Navdeep Jaitly
  • Quoc V. Le
  • Oriol Vinyals
چکیده

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On the Google Voice Search task, LAS achieves a word error rate (WER) of 14.2% without a dictionary or a language model, and 11.2% with language model rescoring over the top 32 beams. In comparison, the stateof-the-art CLDNN-HMM model achieves a WER of 10.9%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Performance of Online Neural Transducer Models

Having a sequence-to-sequence model which can operate in an online fashion is important for streaming applications such as Voice Search. Neural transducer is a streaming sequence-to-sequence model, but has shown a significant degradation in performance compared to nonstreaming models such as Listen, Attend and Spell (LAS). In this paper, we present various improvements to NT. Specifically, we l...

متن کامل

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network. In this work, we look at one such sequence-to-sequence model, namely listen, attend and spell (LAS) [1], and explore the possibility of trainin...

متن کامل

Speech recognition for medical conversations

In this paper we document our experiences with developing speech recognition for medical transcription – a system that automatically transcribes doctor-patient conversations. Towards this goal, we built a system along two different methodological lines – a Connectionist Temporal Classification (CTC) phoneme based model and a Listen Attend and Spell (LAS) grapheme based model. To train these mod...

متن کامل

مداخله‌های کاردرمانی در دانش‌آموزان با ناتوانی یادگیری خاص

Background: The students with Specific Learning Disorders (SLD) have deficits in special aspects of perceptual processes involved in understanding or in using language, spoken or written, that may manifest itself to inability to listen, think, speak, read, write, spell or do mathematical calculations. Along with teachers, occupational therapist, speech therapist and psychologist help the studen...

متن کامل

Analysis of the Spell of Rainy Days in Lake Urmia Basin using Markov Chain Model

In this study, the Frequency and the spell of rainy days was analyzed in Lake Uremia Basin using Markov chain model. For this purpose, the daily precipitation data of 7 synoptic stations in Lake Uremia basin were used for the period 1995- 2014. The daily precipitation data at each station were classified into the wet and dry state and the fitness of first order Markov chain on data series was e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1508.01211  شماره 

صفحات  -

تاریخ انتشار 2015